dnadna.transforms
Data transforms that can be applied during training.
Functions
|
Remove sites that are no longer polymorphic in sample. |
Classes
|
Pseudo-transform that composes multiple transforms by applying them in order one after the other. |
|
Crop the SNP matrix and position array to a maximum size. |
|
Changes the format of the input position array. |
|
Given a sequence, return a random rotation of it along the SNP axis. |
|
This transform specifies in what format the SNP matrix and position arrays are combined to form the input to the network. |
|
Subsample SNP matrix of size (n, k), with n individuals and k SNPs and return a matrix of size (m, l), with m individuals and m < n and l SNPs with l <= k because columns without SNP anymore are not kept. |
Dataset transform. |
|
|
A special transform that does not actually modify the data, but merely performs certain verifications on it. |
Exceptions
|
Exception raised when a sample doesn't meet the minimum requirements for the dataset. |
|
Exception raised when applying a |
- class dnadna.transforms.Compose(transforms)[source]
Bases:
object
Pseudo-transform that composes multiple transforms by applying them in order one after the other.
- class dnadna.transforms.Crop(max_snp=None, max_indiv=None, keep_polymorphic_only=True)[source]
Bases:
Transform
Crop the SNP matrix and position array to a maximum size.
- Parameters:
keep_polymorphic_only (bool) – if true, SNPs that are not polymorphic are removed
- Keyword Arguments:
- name = 'crop'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform.crop'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- schema = {'description': 'Crop the SNP matrix and position array to a maximum size', 'properties': {'keep_polymorphic_only': {'default': True, 'description': 'After subsampling or cropping the individuals dimension of a SNP matrix, if some sites are no longer polymorphic they will be removed if keep_polymorphic_only=True ', 'type': ['boolean']}, 'max_indiv': {'default': None, 'description': 'Maximum number of individuals to crop dataset outputs to, set this to less-than-or-equal to preprocessing.min_indiv to ensure that all samples have the same number of individuals, as some nets require a fixed number of individuals.', 'minimum': 1, 'type': ['integer', 'null']}, 'max_snp': {'default': None, 'description': 'Maximum number of SNPs to crop dataset outputs to. Set this to less-than-or-equal to preprocessing.min_snp to ensure that all samples have the same number of SNPs, as some nets require a fixed number of SNPs.', 'minimum': 1, 'type': ['integer', 'null']}}, 'type': 'object'}
Schema for the plugin’s configuration, if any.
It can be either a string containing the name (without the
.yml
extension) of a schema in the default schema path (for built-in plugins) or adict
representing the schema.Not all plugins must have schemas.
For now a class can only be a subclass of one pluggable; in other words a single class cannot provide multiple plugin interfaces.
- exception dnadna.transforms.InvalidSNPSample(msg, sample=None)[source]
Bases:
Exception
Exception raised when a sample doesn’t meet the minimum requirements for the dataset.
Used by
ValidateSnp
.
- class dnadna.transforms.ReformatPosition(distance=None, normalized=None, circular=None, chromosome_size=None, initial_position=None)[source]
Bases:
Transform
Changes the format of the input position array.
It can change from normalized/unnormalized positions, and can convert between distance and absolute position formats.
When initializing this transform it is only necessary to specify those parameters that you explicitly want to convert.
Warning
This transform should be applied before any other transforms (e.g. rotate) which can modify the position orders, since this transform assumes positions are all in increasing order.
- Keyword Arguments:
distance (bool) – (optional) – If True, change positions to distances or vice-versa; if left unspecified the current position format is kept.
normalized (bool) – (optional) – Divide SNP positions/distances by chromosome size? If True, unnormalized positions are converted to normalized positions and vice-versa; if left unspecified the current normalization is kept. The
chromosome_size
argument is also required when changing the normalization, unless thechromosome_size
is already specified on the inputs.chromosome_size (int) – (optional) – Length of the chromosome; required when transforming from normalized to unnormalized positions. If left unspecified, but the input
SNPSample
has achromosome_size
in itspos_format
, that it will be used.circular (bool) – (optional) – Chromosome should be treated as circular when performing the transformation. Normally the input’s circularity is kept.
initial_position (int or float) – (optional) – A position to use as the initial position when converting from circular positions.
Examples
>>> from dnadna.snp_sample import SNPSample >>> from dnadna.transforms import ReformatPosition >>> import numpy as np
Initial example with unnormalized absolute positions and chromosome_size = 1000:
>>> sample = SNPSample(np.eye(4), [5, 460, 900, 952], ... pos_format={'normalized': False, 'distance': False, ... 'chromosome_size': 1000}) >>> xf = ReformatPosition(normalized=True) >>> xf((sample, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([0.0050, 0.4600, 0.9000, 0.9520], dtype=torch.float64), pos_format={'normalized': True, 'distance': False, 'chromosome_size': 1000} ) >>> xf = ReformatPosition(distance=True) >>> xf((sample, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([ 5, 455, 440, 52]), pos_format={'normalized': False, 'distance': True, 'chromosome_size': 1000} ) >>> xf = ReformatPosition(distance=True, normalized=True) >>> dist_norm = xf((sample, None, None))[0] >>> dist_norm SNPSample( snp=tensor(...), pos=tensor([0.0050, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000} )
Convert from normalized distances back to unnormalized positions:
>>> xf = ReformatPosition(distance=False, normalized=False) >>> xf((dist_norm, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([ 5, 460, 900, 952]), pos_format={'normalized': False, 'distance': False, 'chromosome_size': 1000} )
Convert from normalized linear distances to circular distances:
>>> xf = ReformatPosition(circular=True, initial_position=0.005) >>> xf((dist_norm, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([0.0530, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000, 'circular': True, 'initial_position': 0.005} )
Convert from positions to circular distances:
>>> xf = ReformatPosition(distance=True, circular=True) >>> xf((sample, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([ 53, 455, 440, 52]), pos_format={'normalized': False, 'distance': True, 'chromosome_size': 1000, 'circular': True, 'initial_position': 5} ) >>> xf = ReformatPosition(distance=True, normalized=True, circular=True) >>> circ_norm = xf((sample, None, None))[0] >>> circ_norm SNPSample( snp=tensor(...), pos=tensor([0.0530, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000, 'circular': True, 'initial_position': 0.005} )
Test converting some circular distances, first from circular to non-circular:
>>> xf = ReformatPosition(circular=False) >>> xf((circ_norm, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([0.0050, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000, 'circular': False, 'initial_position': 0.005} )
- name = 'reformat_position'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform.reformat_position'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- schema = {'definitions': {'optional-boolean': {'default': None, 'type': ['boolean', 'null']}}, 'description': 'renormalize the position array', 'properties': {'chromosome_size': {'default': None, 'minimum': 1, 'type': ['integer', 'null']}, 'circular': {'$ref': '#/definitions/optional-boolean'}, 'distance': {'$ref': '#/definitions/optional-boolean'}, 'initial_position': {'default': None, 'type': ['integer', 'number', 'null']}, 'normalized': {'$ref': '#/definitions/optional-boolean'}}, 'type': 'object'}
Schema for the plugin’s configuration, if any.
It can be either a string containing the name (without the
.yml
extension) of a schema in the default schema path (for built-in plugins) or adict
representing the schema.Not all plugins must have schemas.
For now a class can only be a subclass of one pluggable; in other words a single class cannot provide multiple plugin interfaces.
- class dnadna.transforms.Rotate[source]
Bases:
Transform
Given a sequence, return a random rotation of it along the SNP axis.
- Args:
None
- name = 'rotate'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform.rotate'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- schema = {'description': 'apply a random rotation along the SNP axis of a sequence'}
Schema for the plugin’s configuration, if any.
It can be either a string containing the name (without the
.yml
extension) of a schema in the default schema path (for built-in plugins) or adict
representing the schema.Not all plugins must have schemas.
For now a class can only be a subclass of one pluggable; in other words a single class cannot provide multiple plugin interfaces.
- class dnadna.transforms.SnpFormat(format='concat')[source]
Bases:
Transform
This transform specifies in what format the SNP matrix and position arrays are combined to form the input to the network.
Currently this can be one of:
concat: the position array and the SNP matrix are concatenated vertically with the position array becoming the first row of the tensor (this is the default, even if this transform is not used explicitly).
product: the SNP matrix is multiplied by the position array, so that each active site has the value of its position, rather than just
1
.
- name = 'snp_format'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform.snp_format'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- schema = {'description': "when loading SNPs from a dataset, specify whether to concatenate the positions array to the SNP matrix or to multiply them together (uses 'concat' by default)", 'properties': {'format': {'enum': ['concat', 'product'], 'type': 'string'}}, 'type': 'object'}
Schema for the plugin’s configuration, if any.
It can be either a string containing the name (without the
.yml
extension) of a schema in the default schema path (for built-in plugins) or adict
representing the schema.Not all plugins must have schemas.
For now a class can only be a subclass of one pluggable; in other words a single class cannot provide multiple plugin interfaces.
- class dnadna.transforms.Subsample(size, keep_polymorphic_only=True)[source]
Bases:
Transform
Subsample SNP matrix of size (n, k), with n individuals and k SNPs and return a matrix of size (m, l), with m individuals and m < n and l SNPs with l <= k because columns without SNP anymore are not kept.
- Parameters:
- name = 'subsample'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform.subsample'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- schema = {'description': 'take random subsamples of the SNP matrix; the argument is a pair (min, max) of integers giving the range for random sizes of the subsamples, or a single integer giving a fixed size for the subsamples. Use keep_polymorphic_only=False to keep non polymorphic sitesafter subsampling, otherwise they are removed', 'properties': {'keep_polymorphic_only': {'default': True, 'description': 'After subsampling or cropping the individuals dimension of a SNP matrix, if some sites are no longer polymorphic they will be removed if keep_polymorphic_only=True ', 'type': ['boolean']}, 'size': {'oneOf': [{'type': 'array', 'minItems': 2, 'maxItems': 2, 'items': {'type': 'integer', 'minimum': 1}}, {'type': 'integer', 'minimum': 1}]}}, 'type': 'object'}
Schema for the plugin’s configuration, if any.
It can be either a string containing the name (without the
.yml
extension) of a schema in the default schema path (for built-in plugins) or adict
representing the schema.Not all plugins must have schemas.
For now a class can only be a subclass of one pluggable; in other words a single class cannot provide multiple plugin interfaces.
- class dnadna.transforms.Transform[source]
Bases:
Pluggable
Dataset transform.
When loading
SNPSample
s from the dataset, these transforms are applied to the samples to modify either the position or SNP matrix arrays, or both.To implement a transform you must provide its
__call__
method, which takes as input a tuple consisting of theSNPSample
being loaded from the dataset, as well as a the parameters being trained as aLearnedParams
, and the parameter values associated with the sample’s scenario, as loaded from the PandasDataFrame
.- classmethod get_schema()[source]
Provide a schema for validating a single transform in a list of transforms in the config file (see the training config schema) for example usage).
- name = 'transform'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- exception dnadna.transforms.TransformException(transform)[source]
Bases:
Exception
Exception raised when applying a
Transform
to an input.- Parameters:
transform (
dnadna.transforms.Transform
) – The transform that caused the exception.
- class dnadna.transforms.ValidateSnp(uniform_shape=True)[source]
Bases:
Transform
A special transform that does not actually modify the data, but merely performs certain verifications on it.
If verification fails the data sample will be excluded from batches returned by the data loader.
Currently there is only one verification supported, which is to verify that all SNPs have the same shape (same number of SNPs and individuals).
This can be combined e.g. with
Crop
to first crop the SNP sizes to a maximum size, then verify that they are of a consistent shape with previous SNPs in the dataset.- Keyword Arguments:
uniform_shape (bool) – (optional) – Check whether all SNP samples in the dataset have the same shape (same number of SNPs and individuals).
- name = 'validate_snp'
The user-facing name of the plugin, which can be provided by a user implementing a plugin.
Typically it is automatically the same as the internal
Pluggable._name
but users are free to provide their own custom name here when implementing a plugin.
- plugin_url = 'py-obj:dnadna.schemas.plugins.transform.validate_snp'
Base URL for all DNADNA plugins.
New plugins’ schemas can be found relative to this URL unless this attribute is explicitly overridden by the class implementing the plugin.
- schema = {'description': 'validate the SNP resulting from previous transforms (if any); does not modify the data, but will cause it to be excluded from the batch if validation fails', 'properties': {'uniform_shape': {'default': True, 'type': 'boolean'}}, 'type': 'object'}
Schema for the plugin’s configuration, if any.
It can be either a string containing the name (without the
.yml
extension) of a schema in the default schema path (for built-in plugins) or adict
representing the schema.Not all plugins must have schemas.
For now a class can only be a subclass of one pluggable; in other words a single class cannot provide multiple plugin interfaces.